{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create TensorFlow Deep Neural Network Model\n",
    "\n",
    "**Learning Objective**\n",
    "- Create a DNN model using the high-level Estimator API \n",
    "\n",
    "## Introduction\n",
    "\n",
    "We'll begin by modeling our data using a Deep Neural Network. To achieve this we will use the high-level Estimator API in Tensorflow. Have a look at the various models available through the Estimator API in [the documentation here](https://www.tensorflow.org/api_docs/python/tf/estimator). \n",
    "\n",
    "Start by setting the environment variables related to your project."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = \"cloud-training-demos\"  # Replace with your PROJECT\n",
    "BUCKET = \"cloud-training-bucket\"  # Replace with your BUCKET\n",
    "REGION = \"us-central1\"            # Choose an available region for Cloud MLE\n",
    "TFVERSION = \"1.14\"                # TF version for CMLE to use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ[\"BUCKET\"] = BUCKET\n",
    "os.environ[\"PROJECT\"] = PROJECT\n",
    "os.environ[\"REGION\"] = REGION\n",
    "os.environ[\"TFVERSION\"] = TFVERSION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "if ! gcloud storage ls | grep -q gs://${BUCKET}/; then\n",
    "    gcloud storage buckets create --location=${REGION} gs://${BUCKET}\n",
    "fi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "ls *.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create TensorFlow model using TensorFlow's Estimator API ##\n",
    "\n",
    "We'll begin by writing an input function to read the data and define the csv column names and label column. We'll also set the default csv column values and set the number of training steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import shutil\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "print(tf.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CSV_COLUMNS = \"weight_pounds,is_male,mother_age,plurality,gestation_weeks\".split(',')\n",
    "LABEL_COLUMN = \"weight_pounds\"\n",
    "\n",
    "# Set default values for each CSV column\n",
    "DEFAULTS = [[0.0], [\"null\"], [0.0], [\"null\"], [0.0]]\n",
    "TRAIN_STEPS = 1000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the input function\n",
    "\n",
    "Now we are ready to create an input function using the Dataset API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_dataset(filename_pattern, mode, batch_size = 512):\n",
    "    def _input_fn():\n",
    "        def decode_csv(value_column):\n",
    "            columns = tf.decode_csv(records = value_column, record_defaults = DEFAULTS)\n",
    "            features = dict(zip(CSV_COLUMNS, columns))\n",
    "            label = features.pop(LABEL_COLUMN)\n",
    "            return features, label\n",
    "    \n",
    "        # Create list of files that match pattern\n",
    "        file_list = tf.gfile.Glob(filename = filename_pattern)\n",
    "\n",
    "        # Create dataset from file list\n",
    "        dataset = (tf.data.TextLineDataset(filenames = file_list)  # Read text file\n",
    "                     .map(map_func = decode_csv))  # Transform each elem by applying decode_csv fn\n",
    "\n",
    "        if mode == tf.estimator.ModeKeys.TRAIN:\n",
    "            num_epochs = None # indefinitely\n",
    "            dataset = dataset.shuffle(buffer_size = 10 * batch_size)\n",
    "        else:\n",
    "            num_epochs = 1 # end-of-input after this\n",
    "\n",
    "        dataset = dataset.repeat(count = num_epochs).batch(batch_size = batch_size)\n",
    "        return dataset\n",
    "    return _input_fn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the feature columns\n",
    "\n",
    "Next, we define the feature columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_categorical(name, values):\n",
    "    return tf.feature_column.indicator_column(\n",
    "        categorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key = name, vocabulary_list = values))\n",
    "\n",
    "def get_cols():\n",
    "    # Define column types\n",
    "    return [\\\n",
    "          get_categorical(\"is_male\", [\"True\", \"False\", \"Unknown\"]),\n",
    "          tf.feature_column.numeric_column(key = \"mother_age\"),\n",
    "          get_categorical(\"plurality\",\n",
    "                      [\"Single(1)\", \"Twins(2)\", \"Triplets(3)\",\n",
    "                       \"Quadruplets(4)\", \"Quintuplets(5)\",\"Multiple(2+)\"]),\n",
    "          tf.feature_column.numeric_column(key = \"gestation_weeks\")\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the Serving Input function \n",
    "\n",
    "To predict with the TensorFlow model, we also need a serving input function. This will allow us to serve prediction later using the predetermined inputs. We will want all the inputs from our user."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def serving_input_fn():\n",
    "    feature_placeholders = {\n",
    "        \"is_male\": tf.placeholder(dtype = tf.string, shape = [None]),\n",
    "        \"mother_age\": tf.placeholder(dtype = tf.float32, shape = [None]),\n",
    "        \"plurality\": tf.placeholder(dtype = tf.string, shape = [None]),\n",
    "        \"gestation_weeks\": tf.placeholder(dtype = tf.float32, shape = [None])\n",
    "    }\n",
    "    \n",
    "    features = {\n",
    "        key: tf.expand_dims(input = tensor, axis = -1)\n",
    "        for key, tensor in feature_placeholders.items()\n",
    "    }\n",
    "    \n",
    "    return tf.estimator.export.ServingInputReceiver(features = features, receiver_tensors = feature_placeholders)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the model and run training and evaluation\n",
    "\n",
    "Lastly, we'll create the estimator to train and evaluate. In the cell below, we'll set up a `DNNRegressor` estimator and the train and evaluation operations. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train_and_evaluate(output_dir):\n",
    "    EVAL_INTERVAL = 300\n",
    "    \n",
    "    run_config = tf.estimator.RunConfig(\n",
    "        save_checkpoints_secs = EVAL_INTERVAL,\n",
    "    keep_checkpoint_max = 3)\n",
    "    \n",
    "    estimator = tf.estimator.DNNRegressor(\n",
    "        model_dir = output_dir,\n",
    "        feature_columns = get_cols(),\n",
    "        hidden_units = [64, 32],\n",
    "        config = run_config)\n",
    "    \n",
    "    train_spec = tf.estimator.TrainSpec(\n",
    "        input_fn = read_dataset(\"train.csv\", mode = tf.estimator.ModeKeys.TRAIN),\n",
    "        max_steps = TRAIN_STEPS)\n",
    "    \n",
    "    exporter = tf.estimator.LatestExporter(name = \"exporter\", serving_input_receiver_fn = serving_input_fn)\n",
    "    \n",
    "    eval_spec = tf.estimator.EvalSpec(\n",
    "        input_fn = read_dataset(\"eval.csv\", mode = tf.estimator.ModeKeys.EVAL),\n",
    "        steps = None,\n",
    "        start_delay_secs = 60, # start evaluating after N seconds\n",
    "        throttle_secs = EVAL_INTERVAL,  # evaluate every N seconds\n",
    "        exporters = exporter)\n",
    "        \n",
    "    tf.estimator.train_and_evaluate(estimator = estimator, train_spec = train_spec, eval_spec = eval_spec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we train the model!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the model\n",
    "shutil.rmtree(path = \"babyweight_trained_dnn\", ignore_errors = True) # start fresh each time\n",
    "train_and_evaluate(\"babyweight_trained_dnn\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When I ran it, the final RMSE (the average_loss) is about **1.16**. You can explore the contents of the `exporter` directory to see the contains final model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2017-2018 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
